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Information theory, pattern 
recognition, and neural networks 


Pa (Source coding (Data compression) 
= @Noisy-channel coding 


@nference + probabilistic methods 
29-10 Inference 
711 Clustering 
912 Monte Carlo methods 
#13 Advanced Monte Carlo methods 
214 Variational methods 
(Neural networks 
#15 Introduction to feedforward neural networks 
#16 Content-addressable memories 
@State-of-the-art error-correcting codes 


www.inference.phy.cam.ac.uk/itprnn/ 
www.inference.phy.cam.ac.uk/itila/ 


Reading 


@Neural networks 
» Chs 38, 39, (& perhaps 41, 44), 42 


@State-of-the-art error-correcting codes 
» Chs 47, (48, 49,) & 50 


Brains 


How to make a content-addressable memory? 


Content-addressable memory 
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Content-addressable memory 


Oscar-nominated actress called Natalie P****** 
Acted a senator in an amazingly good Lucas film 


Content-addressable memory 


An author 


Content-addressable memory 


Content-addressable memory 


(How to do it? 
(How do brains do it? 


Brains 


(Shown 725 images 
instructed to memorize labels "+/-" 


Fersen, L. 8 Delius, J. D. (1989) Long-term retention of many visual pattems by pigeons. Ethology 82: 141-155. 


What is the difference between 


a pigeon and a supercomputer? 
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Parallel Distributed Processing 


Neural Networks, Part I: 


Feedforward Networks 


Chapters 39, 41, 44 
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(Single neuron as a binary classifier 


@Multi-layer network for regression 


@Probabilistic interpretation of learning 


Learning for a Single Neuron 


cd itp/neuron 
gnuplot 
load 'LDEMO' 


w = (-15.0,1.5,1.0) 


Q: What happens if all three weights are doubled? 
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w = (-15.0,2.0,1.0) 
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Training 8 
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after 30 iterations 
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after 10000 iterations 


E_w(w) versus time (log) 
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Function after 40000 iterations 


learning with weight decay, alpha=0.01 


1 10 100 1000 10000 100000 


alpha=0.01 cf. alpha=0.0 


1 10 100 1000 10000 100000 


learning with alpha=0.01 cf. alpha=0 


alpha = 0.0 
alpha=0.01 ж 


alpha=0 - after 40000 iterations 


solution with alpha=0.01 


Single Neuron for digit classification 


Trained to distinguish 2s from 3s 


Single Neuron for digit classification 


Trained to distinguish 2s from 3s 


Error rate: 10% 


Capacity of a neuron 


Learning as communication 
Learning 
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(How many bits can a neuron learn? 
A single neuron with K inputs can almost certainly 
memorize up to N = 2K random binary labels. 


(Chapter 40) 


So the capacity of a neuron is two bits per connection 
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Capacity of a neuron 


Learning as communication 
Learning 


fti 7 algorithm "Ww ¥ Сара 
(a a (Na 


(How many bits can a neuron learn? 
A single neuron with K inputs can almost certainly 
memorize up to N = 2K random binary labels. 


(Chapter 40) 


So the capacity of a neuron is two bits per connection 


Multi-layer network for regression 


cd itp/neuron/net 
octave 
DEMO 


Steepest descents: 0 iterations 
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Steepest descents: 1 iterations 
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Steepest descents: 9999 iterations 
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Number of iterations 
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Steepest descents (alpha= 1.00): 9999 iterations 


Steepest descents (alpha= 1.00): 9999 iterations 
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solution with alpha=0.01 
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alpha = 0.1 
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optimized solution for alpha = 0.01 
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Langevin method 


© Simple version of Hamiltonian Monte Carlo 
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Dumb Metropolis 
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Langevin 


optimized solution for alpha = 0.01 
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path (w1,w2) of optimizer, alpha=0.01 
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Langevin: first 300 states 
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After 40000 iterations 


weights v time, langevin method 


0 10000 20000 30000 40000 


Every 100th sample from 10000 to 40000 


Langevin method: 30 samples... 


Bayesian predictions 


Bayesian predictions 


G(w) v time, langevin cf. optimizer 


den angevin — — 
1 10 100 1000 10000 100000 
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Bayesian predictions 
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Gaussian approximation c.f. Langevin 


Predictions using optimized params 


Bayesian predictions - Gaussian approximation 
10 


Feedforward neural networks 


@uust high-dimensional curve-fitting? 


@Brains do more 


Multi-layer network for regression 


cd itp/neuron/net 
octave 
DEMO 


Applications 


@Reading aloud - Nettalk 
7 Sejnowski and Rosenberg 
@Handwriting recognition - LeNet 
Yann LeCun 
(@Weld toughness 
^ Bhadeshia et al 
@Focussing multiple-mirror telescopes 
«Roger Angel 


Langevin Monte Carlo (6353 accepts, 8222 rejects): 15075 iterations 
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12 samples 


12 samples -> mean, error bars (one s.d.) 
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Multi-layer network for regression 


cd itp/neuron/net 
octave 
DEMO 
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Applications 


@Reading aloud - Nettalk 
7 Sejnowski and Rosenberg 


@Handwriting recognition - LeNet 
^ Yann LeCun 

@weld toughness 
^ Bhadeshia et al 


@Focussing multiple-mirror telescopes 
9 Roger Angel 


Applications 


(Reading aloud - Nettalk 
9 Sejnowski and Rosenberg 


@Handwriting recognition - LeNet 
Yann LeCun 


(@Weld toughness 
^ Bhadeshia et al 


@Focussing multiple-mirror telescopes 
Roger Angel 


Applications 


Reading aloud - Nettalk 
Sejnowski and Rosenberg 

Handwriting recognition 
Yann LeCun 

Weld toughness 
Bhadeshia et al 


Focussing telescopes 
Roger Angel 


Siemens arc welding facility 


Applications 


(Reading aloud - Nettalk 
^'Sejnowski and Rosenberg 

@Handwriting recognition - LeNet 
Yann LeCun 

(@Weld toughness 


#Bhadeshia et al 


@Focussing multiple-mirror telescopes 
Roger Angel 


Data set Likelihood Probability of parameters 


N=0 
N=2 
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Feedforward neural networks 


@uust high-dimensional curve-fitting? 


@Brains do more 


Brains 


Brains 


How to make a content-addressable memory? 


Shaken, 
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Content-addressable memory challenge 


@Make a dynamical system 
925 dynamical variables, 300 parameters (4-bit precision) 
“that has attracting fixed points at desired memories 
“such that noisy versions are automatically cleaned up 
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Sand such that new memories сап be added incrementally 
@AND robust to corruption of more than half the parameters 


